Doubly stochastic large scale kernel learning with the empirical kernel map

نویسندگان

  • Nikolaas Steenbergen
  • Sebastian Schelter
  • Felix Bießmann
چکیده

With the rise of big data sets, the popularity of kernel methods declined and neural networks took over again. The main problem with kernel methods is that the kernel matrix grows quadratically with the number of data points. Most attempts to scale up kernel methods solve this problem by discarding data points or basis functions of some approximation of the kernel map. Here we present a simple yet effective alternative for scaling up kernel methods that takes into account the entire data set via doubly stochastic optimization of the emprical kernel map. The algorithm is straightforward to implement, in particular in parallel execution settings; it leverages the full power and versatility of classical kernel functions without the need to explicitly formulate a kernel map approximation. We provide empirical evidence that the algorithm works on large data sets.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Scalable Kernel Embedding of Latent Variable Models∗

Kernel embedding of distributions maps distributions to the reproducing kernel Hilbert space (RKHS) of a kernel function, such that subsequent manipulations of distributions can be achieved via RKHS distances, linear and multilinear transformations, and spectral analysis. This framework has led to simple and effective nonparametric algorithms in various machine learning problems, such as featur...

متن کامل

Utilize Old Coordinates: Faster Doubly Stochastic Gradients for Kernel Methods

To address the scalability issue of kernel methods, random features are commonly used for kernel approximation (Rahimi and Recht, 2007). They map the input data to a randomized lowdimensional feature space and apply fast linear learning algorithms on it. However, to achieve high precision results, one might still need a large number of random features, which is infeasible in large-scale applica...

متن کامل

Scalable Kernel Methods via Doubly Stochastic Gradients

The general perception is that kernel methods are not scalable, so neural nets become the choice for large-scale nonlinear learning problems. Have we tried hard enough for kernel methods? In this paper, we propose an approach that scales up kernel methods using a novel concept called “doubly stochastic functional gradients”. Based on the fact that many kernel methods can be expressed as convex ...

متن کامل

Triply Stochastic Gradients on Multiple Kernel Learning

Multiple Kernel Learning (MKL) is highly useful for learning complex data with multiple cues or representations. However, MKL is known to have poor scalability because of the expensive kernel computation. Dai et al (2014) proposed to use a doubly Stochastic Gradient Descent algorithm (doubly SGD) to greatly improve the scalability of kernel methods. However, the algorithm is not suitable for MK...

متن کامل

Scale Up Nonlinear Component Analysis with Doubly Stochastic Gradients

Nonlinear component analysis such as kernel Principle Component Analysis (KPCA) and kernelCanonical Correlation Analysis (KCCA) are widely used in machine learning, statistics and data analysis,and they serve as invaluable preprocessing tools for various purposes such as data exploration, dimensionreduction and feature extraction.However, existing algorithms for nonlinear co...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:
  • CoRR

دوره abs/1609.00585  شماره 

صفحات  -

تاریخ انتشار 2016